Pruning Improves Heuristic Search for Cost-Sensitive Learning

نویسندگان

  • Valentina Bayer Zubek
  • Thomas G. Dietterich
چکیده

This paper addresses cost-sensitive classification in the setting where there are costs for measuring each attribute as well as costs for misclassification errors. We show how to formulate this as a Markov Decision Process in which the transition model is learned from the training data. Specifically, we assume a set of training examples in which all attributes (and the true class) have been measured. We describe a learning algorithm based on the AO∗ heuristic search procedure that searches for the classification policy with minimum expected cost. We provide an admissible heuristic for AO∗ that substantially reduces the number of nodes that need to be expanded, particularly when attribute measurement costs are high. To further prune the search space, we introduce a statistical pruning heuristic based on the principle that if the values of two policies are statistically indistinguishable (on the training data), then we can prune one of the policies from the AO∗ search space. Experiments with realistic and synthetic data demonstrate that these heuristics can substantially reduce the memory needed for AO∗ search without significantly affecting the quality of the learned policy. Hence, these heuristics expand the range of cost-sensitive learning problems for which AO∗ is feasible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cost-sensitive C4.5 with post-pruning and competition

Decision tree is an effective classification approach in data mining and machine learning. In applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3 such as CS-ID3, IDX, λ-ID3 have been proposed to deal with the issue. These algorithms deal with only symbolic data. In this paper, we ...

متن کامل

Cost-sensitive Decision Trees with Post-pruning and Competition for Numeric Data

Decision tree is an effective classification approach in data mining and machine learning. In some applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3, such as CS-ID3, IDX, ICET and λ-ID3, have been proposed to deal with the issue. In this paper, we develop a decision tree algorit...

متن کامل

Real-Time Heuristic Search

We apply the two-pluyer game assumprio~ls of 1i111ited search horizon and cornn~itnrent to nroves i constant time, to .single-agent heuristic search problems. We present a varicrtion of nrinimcr lookuhead search, and an analog to ulphu-betu pruning rlrot signijicantly improves the efficiency c. the algorithm. Paradoxically. the search horizon reachuble with this algorithm increases wir. increus...

متن کامل

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

Feature selection is an essential process in datamining applications since it reduces amodel’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric datawithmeasurement errors.Themajor contributions of this paper are fourfold. First, a newdatamodel is built to address test...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002